Published online 28 November 2013 



Nucleic Acids Research, 2014, Vol. 42, Database issue D643-D648 

doi:10.1093lnarlgktl209 



The SILVA and "All-species Living Tree Project 
(LTP)" taxonomic frameworks 

Pelin Yilmaz^'*, Laura Wegener Parfrey^'^, Pablo Yarza"*, Jan Gerken^'^ 

Elmar Pruesse^'^, Christian Quast\ Timmy Schweer^ Jorg Peplies^, Wolfgang Ludwig 

and Frank Oliver Glockner^'^'* 

^Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, 
D-28359 Bremen, Germany, ^Department of Botany, University of British Columbia, Vancouver V6T 1Z4, 
Canada, ^Department of Zoology, University of British Columbia, Vancouver V6T 1Z4, Canada, "^Ribocon 
GmbH, D-28359 Bremen, Germany, ^School of Engineering and Science, Jacobs University Bremen gGmbH, 
D-28759 Bremen, Germany and ^Lehrstuhl fur Mikrobiologie, Technische Universitat Munchen, D-853530 
Freising, Germany 

Received September 24, 2013; Revised and Accepted November 4, 2013 



ABSTRACT 

SILVA (from Latin silva, forest, http://www.arb-silva. 
de) is a comprehensive resource for up-to-date 
quality-controlled databases of aligned ribosomal 
RNA (rRNA) gene sequences from the Bacteria, 
Archaea and Eukaryota domains and supplementary 
online services. SILVA provides a manually curated 
taxonomy for all three domains of life, based on 
representative phylogenetic trees for the small- 
and large-subunit rRNA genes. This article 
describes the improvements the SILVA taxonomy 
has undergone in the last 3 years. Specifically we 
are focusing on the curation process, the various 
resources used for curation and the comparison of 
the SILVA taxonomy with Greengenes and RDP-II 
taxonomies. Our comparisons not only revealed a 
reasonable overlap between the taxa names, but 
also points to significant differences in both 
names and numbers of taxa between the three 
resources. 

IMPORTANCE OF A TAXONOMIC FRAMEWORK 
FOR MICROBIOLOGY 

Most life on earth is microbial, belonging to the 'Bacteria' 
and 'Archaea' domains (1), and to numerous hneages of 
microbial 'Eukaryota' (e.g. protists) (2). Less than 1% of 
microbes are cultivable, and therefore diversity was vastly 
underestimated by traditional microbiological methods 
(3). The known extent of microbial diversity has grown 
and continues to grow rapidly as sequence-based methods 
are used to characterize microbes (4). One of the major 



breakthroughs in the study of the diversity of microbes 
was the use of the ribosomal rRNA (rRNA) gene 
sequences, particularly of the small subunit (SSU; also 
called 16S rRNA for Bacteria and Archaea and 18S 
rRNA for Eukaryota). For the first time, direct compari- 
sons between divergent microbial hneages became possible 
and the evolutionary relationships among all microorgan- 
isms could be elucidated (1,5,6), leading to a unified three- 
domain taxonomy (7). In conjunction with this molecular 
framework for taxonomy, ecological surveys of rRNA 
gene diversity in the environment have made it possible 
to appreciate the extent of microbial diversity present on 
earth (8). Appropriate taxonomic classification in 
sequence databases is crucial for organizing and catalog- 
ing microbial diversity. The SILVA rRNA gene databases 
use a phylogenetic tree-guided manual curation approach 
for the taxonomy of Bacteria, Archaea and Eukaryota. 
The eukaryotic taxonomy has recently undergone exten- 
sive curation to reflect consensus views on evolutionary 
relationships among the major eukaryotic hneages, 
which are predominantly microbial. 

With this article we would like to express our gratitude 
and honor to Prof Dr Jean Euzeby for his tireless work in 
providing the 'List of Prokaryotic Names with Standing in 
Nomenclature (LPSN)'. Since 1997, he has manually 
checked all issues of International Journal of Systematic 
and Evolutionary Microbiology (IJSEM) to extract the 
taxonomic information (such as new species, new combin- 
ations and emendations), classify it in an orderly manner 
and make it electronically available in LPSN (9). 
Furthermore, LPSN compiles information provided by: 
the Taxonomic Outline of Bacteria and Archaea 
(TOBA), the NCBI taxonomy (10), the Taxonomic 
Outlines of the Bergey's Manual of Systematic 
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Bacteriology (11) and suggestions made by 'The All- 
Species Living Tree Project (LTP)' (12). Finally, we 
would also like to thank Dr Aidan C. Parte for taking 
over Prof. Dr Euzeby's tasks, and continuing the LSPN 
resource (13). 

NOMENCLATURE AND CLASSIFICATION 

The current nomenclature of Bacteria and Archaea is 
officially regulated by the International Committee on 
Systematics of Prokaryotes (ICSP). The ICSP is the or- 
ganization responsible for editing the Bacteriological 
Code (14), which is an official compilation of principles, 
rules and recommendations for naming new taxa and 
renaming existing taxa. Valid taxonomic ranks are subspe- 
cies, species, subgenus, genus, subtribe, tribe, subfamily, 
family, suborder, order, subclass and class. While 
subgenus, subtribe and tribe have fallen into disuse, the 
two categories phylum and domain are commonly used, 
although not officially covered by the Bacteriological 
Code. In addition, the ICSP enforced the publication of 
all new names and combinations of names in IJSEM to 
compile a vahd standing in taxonomic records. 
Consequently, all new species or new combinations that 
have been pubhshed in other journals must finally appear 
in a Vahdation List pubhshed periodically by the IJSEM 
journal. Classification of taxa, however, occurs in parallel 
without official rules and is subject to personal opinion. 
Nevertheless, genealogical inferences of Bacteria and 
Archaea based on the 16S rRNA gene (15) are stiU 
regarded as the most accepted classification scheme (11). 

BACTERIAL AND ARCHAEAL TAXONOMY IN SILVA 

SILVA predominantly uses phylogenetic classification 
based on an SSU guide tree. Classification and clade 
names are informed by widely accepted sources, and 
discrepancies are resolved with the overall aim of 
making classification consistent with phylogeny. With 
release 100 in 2009, the SILVA full-length (>1200 bases 
for Bacteria/Eukaryota and >900 bases for Archaea) SSU 
gene guide tree went through a major manual curation 
effort to represent bacterial and archaeal taxa as groups 
in the tree. The core of this guide tree is based on the fuU- 
length sequence tree of the ARB 2004 release (curated and 
distributed by Wolfgang Ludwig), and is built by adding 
new sequences using the ARB parsimony tool in combin- 
ation with filters to remove highly variable positions (16). 

In the foUowing releases, the curated classifications were 
extended to cover bacterial and archaeal full-length large 
subunit (LSU, 23S rRNA) and eukaryotic full-length SSU 
(18S rRNA) gene sequences. Finally, with the SILVA 
release 115 in August 2013, all quality-checked SSU and 
LSU rRNA gene sequences from all three domains of life 
were automatically classified based on the established SSU 
and LSU reference taxonomies. The bacterial and 
archaeal classification in SILVA is based on Bergey's 
Taxonomic Outlines (17-20). Because both taxonomy 
and species are dynamic entities, changes are rapid and 
supplemental resources are required. In such cases, name 



changes and taxonomic outhnes are adapted from LPSN 
(9). Moreover, the LPSN resource is also used to track 
down names without standing in nomenclature (i.e. not 
vahdly pubhshed taxa) and Candidatus taxa. 

The curation of classification in SILVA uses a phylo- 
genetic tree-based process, whereas LPSN and Bergey's 
classifications are not explicitly phylogenetic; thus, topo- 
logical differences between the SILVA Ref trees 
and other resources are expected. Most notably in 
Proteobacteria, where the Bergey's taxonomic framework 
requires updates based on new phylogenetic findings, such 
discrepancies are observed. For example, the genus 
Ahrensia (type species accession: D88524) is classified 
under family Rhodobacteraceae of Alphaproteobacteria 
in LPSN; however, in the SSU Ref guide tree, this genus 
is grouped together with members of the family 
PhyUobacteriaceae. Normally, introducing polyphyletic 
groups accommodates such discrepancies, but in this 
case genus Ahrensia is kept under PhyUobacteriaceae 
because of high sequence identities (>94%) observed 
with other members of this family. Furthermore, in 
an effort to standardize the number of ranks to exactly 
six (domain, phylum, class, order, family and genus), 
subclasses and suborders have been omitted for some 
groups, as opposed to Bergey's or LPSN's 
recommendations. 

In addition to this traditional taxonomic backbone, ex- 
tensive effort is spent in every release to represent prom- 
inent clades known only from environmental sequences. 
The majority of these clades and groups are annotated in 
the guide tree based on literature surveys, and occasionally 
based on personal communications; therefore, not aU of 
these clades are available in publications. Some examples 
are OCSl 16 clade (21), SAGMC and SAGME groups (22) 
and termite clusters (23). Supplementary Table SI 
provides a full hst of aU such clades and groups that are 
part of the current SILVA taxonomy. We chose to name 
phylogenetically coherent groups above the family rank, 
consisting of only sequences from uncultured organisms, 
after the clone name of the earliest submitted sequence. 



EUKARYOTIC TAXONOMY IN SILVA 

Historically, eukaryotic classification has been more inten- 
sively studied than bacterial and archaeal taxonomy, and 
is governed by the zoological and botanical nomenclatural 
codes (ICZN and ICN). The taxonomic landscape for mi- 
crobial Eukaryota is complicated because lineages are 
governed by one or both codes but fit neither (24,25), 
and classification has gone through major upheavals 
in recent times (26,27). These complications plus the 
fact that protists are infrequently included in microbial 
ecology studies (28) are most likely the reasons for 
SILVA being unique in its inclusion of sequences and 
taxonomies of all three domains of hfe, although a 
curated database dedicated exclusively to eukaryotic 
sequences has recently become available (29). 

Eukaryotic taxonomy within the SILVA database has 
been significantly improved over the past years, after the 
inception of the Eukaryotic Taxonomy Working Group 
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(ETWG; http://www.arb-silva.de/projects/eukaryotic- 
taxonomy). With SILVA release 111 in 2012, ETWG 
implemented a phylogenetic tree-guided curated 
taxonomy for Eukaryota based on the consensus views 
of the International Society of Protistologists (26,30), 
which focused on the higher taxonomic levels (e.g. 
Opisthokonta, Stramenopiles, Excavata). The taxonomy 
has been further improved with SILVA release 115 to 
adhere to the latest publication from the ISOP 
taxonomy committee (26). Moreover, higher-level ranks 
have been revised in SILVA release 115 for plants, 
fungi, and animals, although these groups are still to 
be considered 'work in progress' and their respective 
classifications should be viewed as provisional. 

One aspect of taxonomy particularly important for in- 
tegration with computational tools is the number of ranks 
used to classify an organism. Ideally, hfe would aU be 
neatly classifiable into a fixed number of ranks (e.g. the 
seven Linnean ranks), implying that ranks are directly 
comparable. However, the different taxonomies (i.e. for 
plants and for Bacteria and Archaea) are constructed 
with different definitions of their units (e.g. species) and 
often, intermediate ranks are useful to better resolve the 
hierarchy of a particular group of organisms. Thus, the 
meaning of ranks and the number used to classify a taxon 
vary widely across the tree of life (30,31). Taxa such as ver- 
tebrate animals are represented by 1 5 levels of taxonomy, 
while some microbial lineages, such as some lineages in 
Amoebozoa (Fractovitelhida), are merely represented by 
three. Moreover, the degree of genetic divergence and evo- 
lutionary distance encompassed by a given rank also 
varies enormously, such that a genus can have no vari- 
ation in the SSU rRNA gene at aU (e.g. fungi) or more 
than 10% (in some protist hneages) (32). 

To accommodate variations in the number of nested 
levels used to classify different hneages and to increase 
the stability of classification, ISOP has adopted rankless 
classification (24,25,30), i.e. the nested position of the 
taxon in the taxonomic hierarchy does not imply a 
Linnean rank. The SILVA eukaryotic classification 
reflects this fluidity by reporting the fufl taxonomic 
string for lineages regardless of the number of levels 
included. However, we recognized the difficulties this 
causes in computational analyses. To address this, a 
table of classification rank designations is provided with 
each fuU release of SILVA to help users address the prac- 
tical challenges of using bioinformatic tools that are 
designed for, or require, explicit rank information (33). 
The highest-level groups of Eukaryota and their 
designated ranks are given in Table 1. Within the table, 
rank designations adhere to the overall goal of assigning 
the same rank to roughly equivalent evolutionary levels 
across the tree. Only taxonomic levels that are distinguish- 
able in the SILVA guide tree (also included in releases) are 
included in the table file; therefore, several levels of 
animal, plant and fungal taxonomy are missing. These 
ranks serve as a guideline, and we welcome users to 
modify them according to their specific analysis needs. 
Like taxonomy as a whole, the curation of eukaryotic 
classification within SILVA is a work in progress and 
will continue to evolve. Suggestions for revisions to the 



Table 1. Major eukaryotic Hneages represented at the highest level of 
the taxonomic hierarchy in the current SILVA release, and a 
comparison to the highest level of the Protist Ribosomal Reference 
Database (PR2) database 



SILVA 115 


PR2 


Rank 


Amoebozoa 


Amoebozoa 


Kingdom 


Archaeplastida 


Archaeplastida 


Major clade 




Apusozoa 




Centrohelida 


Hacrobia (Centroheliozoa) 


Kingdom 


Cryptophyceae 


Hacrobia (Cryptophyta) 


Kingdom 


Excavata 


Excavata 


Major clade 


Haptophyta 


Hacrobia (Haptophyta) 


Kingdom 


Incertae Sedis 




Kingdom 


Opisthokonta 


Opisthokonta 


Major clade 


Picozoa 


Hacrobia (Picobihphyta) 


Phylum 


SAR (Stramenopiles, 


Alveolata 


Major clade 


Alveolata, Rhizaria) 


Rhizaria 






Stramenopiles 





The Rank column is provisional, as explained in the Eukaryotic 
Taxonoiny section, and only refers to the hneages from SILVA 115 
release. 



taxonomy for eukaryotic clades are welcome, and users 
are encouraged to contact the SILVA team at 
contact @ arb- silva.de. 

LTP TAXONOMY 

The LTP is an initiative for the development of highly 
curated 16S and 23S rRNA gene sequence databases, 
universal alignments and reference phylogenetic trees of 
all the type strains of Bacteria and Archaea (34). The 
LTP taxonomy represents the nomenclature and classifi- 
cation of all bacterial and archaeal taxa with validly pub- 
hshed names as given in LPSN. Like SILVA, LTP is an 
authorized provider of the LPSN taxonomy. LPSN, as a 
partner of the LTP team, has facilitated the access 
to TOBA, the NCBI taxonomy and the Taxonomic 
Outlines of the Bergey's Manual of Systematic 
Bacteriology, which for the first time appear in a 
database-hke format inside the LTP's ARB formatted 
and CSV exports. The several months of delay between 
IJSEM and LTP releases may cause slight variations 
between LTP's and LPSN's taxonomy, with LPSN 
having the latest updated taxonomy. The LTP taxonomy 
is distributed over four fields of information: (1) 
(fullname_ltp) corresponds to the species or subspecies 
name; (2) (high_tax_ltp) is the name of the next higher 
taxon above genus; (3) (type_ltp) contains information 
about type species that are the nomenclatural types for 
the higher taxa above species; and (4) (tax_ltp) contains 
the complete classification into higher ranks as it appears 
in LPSN. More infomation can be found at: http://www. 
arb-silva.de/projects/living-tree. 



COMPARISON OF SILVA TAXONOMY WITH RDP-II 
AND GREENGENES 

The SILVA, RDP-II and Greengenes databases have dif- 
ferent approaches for obtaining a taxonomic hierarchy, 
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and the subsequent classitication of rRNA gene sequences 
(35,36). For example, Greengenes uses a mixture of differ- 
ent resources (own curation, NCBI, SILVA, RDP-II), 
while RDP-II is based on Bergey's outline with minor 
additions from NCBI. Here, we provide a brief compari- 
son of the three taxonomic hierarchies based on rank 
names at the phylum and genus level, based on the 
taxonomies taken from RDP-II and Greengenes as of 
May 2013. 

SILVA release 1 1 5 consists of 46 taxa at phylum level, 
which includes both commonly agreed 'named' phyla and 
the widely accepted candidate divisions. All named phyla 
between the three databases are congruent, except for 
Greengenes, which is missing Korarchaeota (Table 2). 
Most disagreement occurred for candidate divisions. In 
SILVA, we currently designate 12 divisions, 6 of which 
are shared with RDP-II and 9 are shared with 
Greengenes. It is important to note that the same candi- 
date divisions might be named differently in each 
database. In addition to named phyla and the candidate 
divisions, both SILVA and Greengenes have a number of 
phylum level taxa consisting only of environmental 
sequences. Supplementary Table S2 provides an 
overview of these. While some of these groups are quite 
small, comprising only few sequences that could be treeing 
artifacts, those that collect >100 sequences certainly 
require attention and sound coherence testing via different 
treeing approaches. 

In the next step, we investigated shared named and 
Candidatus taxa at the genus level. Here, the results 
were quite different from the phylum-level comparison, 
as both the amount of taxa and their names differed 
between the three databases (Figure 1). 

The RDP-II and SILVA projects share the highest 
amount of taxa at the genus level, while SILVA and 
Greengenes share the least. Furthermore, SILVA has 
the highest number of unique taxa included in the classi- 
fication, followed by Greengenes and RDP-II. The higher 
number of unique taxa in SILVA can be attributed to the 
inclusion of Candidatus taxa and taxa without standing in 
nomenclature taken from LPSN. This comparison also 
illustrates the differences (and similarities) in the 
curation procedure and the resources used for curation; 
SILVA and RDP-II appear to use the same resources and 
follow the same guidehnes. Finally, it is important to point 
out that despite the differences between curation methods 
used, the three databases still share a sizeable number of 
taxa with each other. 



OUTLOOK 

SILVA and LTP taxonomies will be maintained with high 
dihgence into the future. A primary focus of the taxo- 
nomic efforts in SILVA will be continual improvements 
of the specific eukaryotic groups, i.e. fungi, plants and 
animals. 

A further objective is the reconciliation of the classifi- 
cations used by all rRNA gene databases. The aim of such 
a project would be, at the very least, to provide users with 
the exact same name for a given taxon, regardless of the 



Table 2. Comparison of the phyla and candidate divisions between 
SILVA, RDP-11 and Greengenes 



CTT \J \ 
olL V /A 




Greengenes 


Acidobacteria 


Acidobacteria 


Acidobacteria 


Actinobeicteria 


Actinobacteria 


Actinobacteria 


AQLliflC3.6 


Acjuincae 


Acjuificae 


Armatimonadetes 


Armatimonadetes 


Armatimonadetes 


Bacteroidetes 


Bacteroidetes 


Bacteroidetes 


Caldiserica 


Caldiserica 


Caldiserica 


Chlamydiae 


Chlamydiae 


Chlamydiae 


111,-. >-/-,!-.; 

LlllOlODl 


LnloiODi 


LnioioDi 


Chloroflexi 


Chloroflexi 


Chloroflexi 


Chrysiogenetes 


Chrysiogenetes 


Chrysiogenetes 


Crenarchaeota 


Crenarchaeota 


Crenarchaeota 


Cyanobacteria 


Cyanobacteria/ 


Cyanobacteria 




Chloroplast 




Defembacteres 


Deferribac teres 


Defembacteres 


Deinococc US- 


Deinococcus- 


Thermi 


Therm us 


Thermus 




Dictyoglomi 


Dictyoglomi 


Dictyoglomi 


Elusuiiicrobia 


Elusimicrobia 


Elusimicrobia 


Euryarchaeota 


Euryarchaeota 


Euryarchaeota 


Fibrobacteres 


Fibrobacteres 


Fibrobacteres 


Firmicutes 


Firmicutes 


Firmicutes 


Fusobacteria 


Fusobacteria 


Fusobacteria 


Gemma timonadetes 


Gemmatimonadetes 


Gemmatimonadetes 


Korarchaeota 


Korarchaeota 




Lentisphaerae 


Lentisphaerae 


Lentisphaerae 


Nanoarchaeota 


Nanoarchaeota 


Nanoarchaeota 


Nitrospirae 


Nitrospira 


Nitrospirae 


Planctomycetes 


Planctomycetes 


Planctomycetes 


Proteobacteria 


Proteobacteria 


Proteobacteria 


Spirochaetae 


Spirochaetes 


Spirochaetes 


Synergistetes 


Synergistetes 


Synergistetes 


Tenericutes 


Tenericutes 


Tenericutes 


Thaumarchaeota 


Thaumarchaeota 


Thaumarchaeota 


Thermodesulfobacteria 


Thermodesulfobacteria Thermodesulfobacteria 


Thermotogae 


Thermotogae 


Thermotogae 


Verrucomicrobia 


Verrucomicrobia 


Verrucomicrobia 


Candidate division 


BRCl 


BRCl 


BRCl 






Candidate division JSl 






Candidate division KBl 






Candidate division 


ODl 


ABYl_ODl 


yJDi 






Candidate division 


OPll 


OPll 


OPll 






Candidate division OP3 




OP3 


Candidate division OPS 




OPS 


Candidate division OP9 




OP9 


Candidate division 


SRI 


SRI 


SRI 






Candidate division 


TM7 


TM7 


TM7 






Candidate division 


WS3 


WS3 


WS3 






Candidate division WS6 




WS6 



Differences are marked in bold. 



data set and classification method used. Talks are 
underway to implement reconciliation for Bacteria and 
Archaea. As there is good agreement on named taxa, 
such reconciliation should be relatively straightfor- 
ward and mainly involving better sharing of data. 
Reconciliation of clades and unnamed groups, however, 
will require more efforts. As the ETWG working group 
also consists of Greengenes and RDP-II colleagues, such 
reconciliation will not be necessary for Eukaryota. 
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SILVA 



RDP- 




Greengenes 



Figure 1. Venn diagram showing the number of shared taxa at genus 
level between SILVA, RDP-II and Greengenes. Only taxa containing 
sequences from cultivated organisms are included in this comparison. 
The overlapping part in the middle shows the number of all taxa jointly 
shared by all three databases; the other overlaps show taxa shared 
between two databases, but not the third. RDP-II and Greengenes 
share no other taxa in addition to the 949 shared jointly by all three 
databases. 



With the newly developed SILVA-NGS pipehne, we 
will grant easy access to the SILVA taxonomy for the 
classification of rRNA gene amplicon data. SILVA-NGS 
accepts any kind of short- and long-read sequence rRNA 
gene data in FASTA format and performs quahty control, 
ahgnment and classification of rRNA genes based on the 
curated SILVA taxonomy. All steps (upload, progress 
monitoring, visualization of results and download of 
data) can be geared via the SILVA-NGS web-interface. 
The system is available at www.arb-silva.de/ngs. 

Finally, we would hke to emphasize that the SILVA 
taxonomy curation is an open and transparent process, 
and input from users and experts is highly appreciated. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Onhne. 
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